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Abstract 

The figure-ground segmentation of humans in images 
captured in natural environments is an outstanding open 
problem due to the presence of complex backgrounds, artic¬ 
ulation, varying body proportions, partial views and view¬ 
point changes. In this work we propose class-specific seg¬ 
mentation models that leverage parametric max-flow im¬ 
age segmentation and a large dataset of human shapes. 
Our contributions are as follows: (1) formulation of a sub- 
modular energy model that combines class-specific struc¬ 
tural constraints and data-driven shape priors, within a 
parametric max-flow optimization methodology that sys¬ 
tematically computes all breakpoints of the model in poly¬ 
nomial time; (2) design of a data-driven class-specific fu¬ 
sion methodology, based on matching against a large train¬ 
ing set of exemplar human shapes (100,000 in our ex¬ 
periments), that allows the shape prior to be constructed 
on-the-fly, for arbitrary viewpoints and partial views. (3) 
demonstration of state of the art results, in two challeng¬ 
ing datasets, H3D and MPII (where figure-ground segmen¬ 
tation annotations have been added by us), where we sub¬ 
stantially improve on the first ranked hypothesis estimates 
of mid-level segmentation methods, by 20%, with hypothe¬ 
sis set sizes that are up to one order of magnitude smaller. 


1. Introduction 

Detecting and segmenting people in real-world envi¬ 
ronments are central problems with applications in index¬ 
ing, surveillance, 3D reconstruction and action recognition. 
Prior work in 3D human pose reconstruction from monoc¬ 
ular images 1431 \T2\ [211 . as well as more recent, successful 
RGB-D sensing systems based on Kinect flTll have shown 
that the availability of a figure-ground segmentation opens 
paths towards robust and scalable systems for human sens¬ 
ing. Despite substantial progress, the figure-ground seg¬ 


mentation in RGB images remains extremely challenging, 
because people are observed from a variety of viewpoints, 
have complex articulated skeletal structure, varying body 
proportions and clothing, and are often partially occluded 
by other people or objects in the scene. The complexity of 
the background further complicates matters, particularly as 
any limb decomposition of the human body leads to parts 
that are relatively regular but not sufficiently distinctive 
even when spatial connectivity constraints are enforced ll47ll . 
Set aside appearance inhomogeneity and color variability 
due to clothing, which can overlap the background distribu¬ 
tion significantly, it is well known that many of the generic, 
parallel line (ribbon) detectors designed to detect human 
limbs, fire at high false positive rates in the background. 
This has motivated work towards detecting more distinc¬ 
tive part configurations, without restrictive assumptions on 
part visibility (e.g. full or upper view of the person), for 
which poseletsl?! have been a successful example. How¬ 
ever, besides relatively high false positive rates typical in 
detection, the transition from a bounding box of the person 
to a full segmentation of the human body is not straightfor¬ 
ward. The challenge is to balance, on one hand, sufficient 
flexibility towards representing variability due to viewpoint, 
partial views and articulation, and, on the other hand, suffi¬ 
cient constraints in order to obtain segmentations that cor¬ 
respond to meaningful human shapes, all relying on region 
or structural human body part detectors that may only be 
partial or not always spatially accurate. 

In this work we attempt to connect two relevant, re¬ 
cent lines of work, for the segmentation of people in real 
images. We rely on bottom-up figure-ground generation 
methods and region-level person classifiers in order to iden¬ 
tify promising hypothesis for further processing. In a 
second pass we set up informed constraints towards (hu¬ 
man) class-specific figure-ground segmentation by leverag¬ 
ing skeletal information and data-driven shape priors com¬ 
puted on the fly by matching region candidates against ex¬ 
emplars of a large, recently introduced human motion cap¬ 
ture dataset containing 3D and 2D semantic skeleton infor- 
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mation of people, as well images and figure-ground masks 
from background subtraction (Human3.6M||23l). By ex¬ 
ploiting globally optimal parametric max-flow energy min¬ 
imization solvers, this time based on a class dependent (as 
opposed to generic and regular) foreground seeding process 
(isiiisiiia, we show that we can considerably improve the 
state of the art. To our knowledge this is one of the first 
formulations for class-specific segmentation that can han¬ 
dle multiple viewpoints and any partial view of the person, 
in principle. It is also one of the first to leverage a large 
dataset of human shapes, together with semantic structural 
information, which until recently, have not been available. 
We show that such constraints are critical for accuracy, ro¬ 
bustness, and computational efficiency. 

1.1. Related Work 

The literature on segmentation is huge, even when con¬ 
sidered only under sub-categories like top-down (class- 
specific) and bottom-up segmentation. Humans are of con¬ 
siderable interest to be devoted special methodology, if that 
proves to be effectivell^l45lfT^l46ll4^frai3im 
One approach is to consider shape as category-specific 
property and integrate it within models that are driven by 
bottom-up processing O [6l [T] |27] (TOl ESI ESI El ESI Ell El • 
Pishchulin et al. 1381 develop pictorial structure formula¬ 
tions constrained by poselets, focusing on improving the 
response quality of an articulated part-based human model. 
The use of priors based on exemplars has also been ex¬ 
plored, in a data-driven process. Both ||40l |39l focus on 
a matching process in order to identify exemplars that cor¬ 
respond to similar scene or object layouts, then used in a 
graph cut process that enforces spatial smoothness and pro¬ 
vides a global solution. Our approach is related to such 
methods, but we use a novel data-driven prior construction, 
enforce structural constraints adapted to humans, and search 
the state space exhaustively by means of parametric max- 
flow. In contrast to priors used in (401 [33, which require 
a more repeatable scene layout, we focus on a prior gen¬ 
eration process that can handle a diverse set of viewpoints 
and arbitrary partial views, not known a-priori, and different 
across the detected instances. 

Methods like (261 resemble ours in their reliance on a de¬ 
tection stage and the principle of matching that window rep¬ 
resentation against a training set where figure-ground seg¬ 
mentations are available, then optimizing an energy func¬ 
tion based on graph-cuts. Our window representation con¬ 
tains additional detail and this makes it possible to match 
exemplars based on the semantic content identified. Our 
matching and shape prior construction are optimized for hu¬ 
mans, in contrast to the generic ones used in (26l (which 
can however segment any object, not just people, as our fo¬ 
cus her^. We use large prior set of structurally annotated 

^Notice however that the methodology we propose would be applica- 


human shapes, and search the state space using a different, 
parametric multiple hypothesis scheme. Our prior construc¬ 
tion uses, among other elements, a Procrustes alignment not 
unlike (20l but differently: (1) we use it for shape prior con¬ 
struction (input dependent, on the fiy) within energy opti¬ 
mizer as opposed to object detection (classification, con¬ 
struction per class) in (20l . (2) we only use instances that 
align well with query reflecting accurate shape modeling, as 
opposed to fusing top-k instances to capture class variabil¬ 
ity in (20l . An alternative, interesting formulation for object 
segmentation with shape priors is branch- and-mincut l[3T]l . 
who propose a branch and bound procedure in the com¬ 
pound space of binary segmentations and hierarchically or¬ 
ganized shapes. However, the bounding process used for 
efficient search in shape space would rely on knowledge of 
the type of shapes expected and their full visibility. We fo¬ 
cus on a different optimization and modeling approach that 
can handle arbitrary occlusion patterns of shape. Our prior 
constraint for optimization is generated on the fiy by fus¬ 
ing the visible exemplar components, following a structural 
alignment scheme. 

Recently there has been a resurrection of bottom-up 
segmentation methods based on multiple proposal gener¬ 
ation, with surprisingly good results considering the low- 
level processing involved. Some of these methods gener¬ 
ate segment hypotheses either by combining the superpix¬ 
els of a hierarchical clustering methodEl [371 HU [TTl, by 
varying the segmentation parameters (TSll or by searching 
an energy model, parametrically, using graph cutslE] [TSl 
EH EH [351 [141. Most of the latter techniques use mid-level 
shape priors for selection, either following hypothesis gen¬ 
eration cainiEi or during the process. Some methods 
provide a ranking, diversification and compression of hy¬ 
potheses, using e.g. Maximal Marginal Relevance (MMR) 
diversification (T31 [TSll . whereas others report an unordered 
setOEHl- Hypothesis pool sizes in the order of 1,000- 
10,000 range in the expansionary phase, and compressed 
models of 100-1,000 hypotheses following the application 
of trained rankers (operating on mid-level features extracted 
from segments) with diversification, are typical, with vari¬ 
ance due to image complexity and edge structure. While 
prior work has shown that such hypotheses pools can con¬ 
tain remarkably good quality segments (60 — 80% intersec¬ 
tion over union, loU, scores are not uncommon) this leaves 
sufficient space for improvement particularly since sooner 
or later, one is inevitably facing the burden of decision mak¬ 
ing: selecting one hypothesis to report. It is then not uncom¬ 
mon for performance to sharply drop to 40%. This indicates 

ble to other objects than people. Here we focus on people because only for 
them, for now, large training sets of segmented shapes with structural anno¬ 
tations are available, through Human3.6M(23l. However as large datasets 
for other object categories emerge, we expect our methodology to gener¬ 
alize well. In this respect, our results on a challenging visual category, 
humans, are indicative of the performance bounds one can expect. 




that constraints and prior selection methods towards more 
compact, better quality hypothesis sets are necessary. Such 
issues are confronted in the current work. 

2. Methodology 

We will consider an image as / \ V ^ , where V rep¬ 

resents the set of nodes, each associated with a pixel in the 
image, and the range is the associated intensity (RGB) vec¬ 
tor. The image is modeled as a graph G = (V, f). We par¬ 
tition the set of nodes in V into two disjoint sets of V/ and 
Vb which represent the assignments of pixels to foreground 
and background, respectively. E is the subset of edges of 
the graph G which reflects the connections between adja¬ 
cent pixels. The formulation we propose will rely on ob¬ 
ject (or foreground) structural skeleton constraints obtained 
from person detection and 2D localization (in particular the 
identification of keypoints associated with the joints of the 
human body, and the resulting set of nodes corresponding 
to the human skeleton, obtained by connecting keypoints, 
T C V), as well as a data-driven, human shape fusion prior 
S' : V ^ [0,1], constructed ad-hoc by fusing similar config¬ 
urations with the one detected, based on a large dataset of 
human shapes with associated 2D skeleton semantics (see 
our §2.1 1 for details). The energy function defined over the 
graph G, X = U{xu} is: 


£ a ( x )= 5 : Ux{Xu) -|- ^ ^ ^uv{,^ui ^v) ( 1 ) 

wGV {u^v)^£ 


where 


Ux{Xu) = Dx{Xu)^S{Xu) 


j indexes representative pixels in the seed region, selected 
as centers resulting from a k-means algorithm (k is set to 5 
in all of our experiments). The background probability is 
defined similarly. 

The pairwise term Vuv penalizes the assignment of dif¬ 
ferent labels to similar neighboring pixels: 


^UV {Xui Xy) 


0 ifXu= Xy 

g{u,v) iixui^Xy 


(3) 


with similarity between adjacent pixels given by g{u^v) = 


exp 


max(G 6 (u),G 6 (v)) 

- 


Gh returns the output of the 
multi-cue contour detector (35] [32l at a pixel. The bound¬ 
ary sharpness parameter a controls the smoothness of the 
pairwise term. 

The energy function defined by Q is submodular and 
can be optimized using parametric max-flow, in order to 
obtain all breakpoints of Ex{X) as a function of (A, X) in 
polynomial time. 
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Figure 2. Processing steps of our segmentation methods based on 
Constrained Parametric Problem Dependent Cuts (CPDC) with 
Shape Matching, Alignment and Fusion (MAF). 


with A G M, and unary potentials given by semantic fore¬ 
ground constraints Vf <— T: 
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, f{xu) + A 


if Xy^ - 1 , U/ ^ V 5 

ifxu = l,u e Vh 
ifXu=f),U^Vf ^ ^ 
if Xy = b,u ^ Vf 


The foreground bias is implemented as a cost incurred 
by the assignment of non-seed pixels to background, and 
consists of a pixel-dependent value f{xy) and an uniform 
offset A. Two different functions f{xy) are used alterna¬ 
tively. The first is constant and equal to 0, resulting in a 
uniform (variable) foreground bias. The second function 
uses color. Specifically, RGB color distributions Pf{xy) 
on seed V/ and Pb{xy) on seed Vb are estimated and de¬ 
rive f{xy) = The probability distribution of 

pixel j belonging to the foreground is defined as p/(i) = 
exp(— 7 • miuj {\\I(i) — I{j)\\)), with 7 a scaling factor, and 


Given the general formulation in Q and the key 
problems to address are: (a) the identification of a putative 
set of person regions and structural constraints hypotheses 
T; (b) the construction of an effective, yet fiexible data- 
driven human shape prior S, based on a sufficiently diverse 
dataset of people shapes and skeletal structure, given es¬ 
timates for T. (c) minimization of the resulting energy 
model Q- We address (a) without loss of generality, us¬ 
ing a human region classifier (any other set of structural, 
problem dependent detectors can be used, here e.g. face 
and hand detectors based on skin color models or poselets). 
We address (b) using methodology that combines a large 
dataset of human pose shapes and body skeletons, collected 
from Human3. 6 M fTSj with shape matching, alignment and 
fusion analysis, in order to construct the prior on the fly, 
for the instance being analyzed. We refer to a model that 
leverages both problem-dependent structural constraints T 
and a data-driven shape prior S, in a single joint optimiza¬ 
tion problem, as Constraint Parametric Problem Dependent 





















CPMC 0.41 CPDC-MAF-POSELETS 0.38 CPDC-MAF 0.91 



Figure 1. First row: Our Shape Matching Alignment Fusion (MAF) construction based on semantic matching, structural alignment and 
clipping, followed by fusion, to reflect the partial view. Notice that the prior construction allows us to match partial views of a putative 
human detected segment to fully visible exemplars in Human3.6M. This allows us to handle arbitrary patterns of occlusion. We can thus 
create a well adapted prior, on the fly, given a candidate segment. Second and third rows: Examples of segmentations obtained by several 
methods (including the proposed ones), with intersection over union (loU) scores and ground truth shown. See flg.|^for additional image 
segmentation results. 


Cuts with Shape Matching. Alignment and Fusion (CPDC- 
MAF). The integration of bottom-up region detection con¬ 


straints with a shape prior construction is described in ^ 2.1 


The CPDC-MAF model can be optimized in polynomial 
time using parametric max-flow, in order to obtain all break¬ 
points of the associated energy model (addressing c). 


2.1. Data-Driven Shape Matching, Alignment and 
Fusion (MAF) 

We aim to obtain an improved figure-ground segmen¬ 
tation for persons by combining bottom-up and top-down, 
class specific information. We initialize our proposal set 
using CPMClBl. While any figure-ground segmentation 
proposal method can be employed, in principle, we chose 
CPMC due to its performance and because our method 
can be viewed as a generalization with problem dependent 
seeds and shape priors. We filter the top N segment candi¬ 
dates using an 02 P ( 121 -region classifier trained to respond 


to humans, using examples from Human3.6M, to obtain 
V = {di = {z, b}, = 1,... N}. Each candidate segment 

is represented by a binary mask z^, 1 stands for foreground 
and 0 stands for background and a bounding box b G 
where b = (m, n, tu, /i). m and n represent the image co¬ 
ordinates of the bottom left corner of the bounding box, w 
and h represents its width and its height. 

We will use the set of human region candidates in or¬ 
der to match against a set of human shape and construct a 
shape prior. There are challenges however, particularly be¬ 
ing able to: ( 1 ) access a sufficiently representative set of 
human shapes to construct the prior, (2) be sufficiently flex¬ 
ible so that human shapes from the dataset, which are very 
different from the shape being analyzed, would not nega¬ 
tively impact estimates, (3) handle partial views—while we 
rely on bottom-up proposals that can handle partial views, 
the use, in contrast, of a shape prior that can only represent, 
e.g. full or upper-body views, would not be effective. 
























We address: (1) by employing a dataset of 100,000 hu¬ 
man shapes together with the corresponding skeleton struc¬ 
ture, sub-sampled from the recently created Human3.6M 
dataset ll2^ : (2) by employing a matching, alignment and 
fusion technique between the current segment and the indi¬ 
vidual exemplar shapes in the dataset. Shapes and struc¬ 
tures which cannot be matched and aligned properly are 
discarded; (3) by leveraging the implicit correspondences 
available across training shapes, at the level of local shape 
matches, by only aligning and warping those components 
of the exemplar shapes that can be matched to the query, at 
the level of joints. A sample flow of our entire method can 
be visualized in figurefirst row and figure]^ 

Boundary Point Sampling: Given a bottom-up figure- 
ground proposal represented as a binary mask z e V, 
we sample through the image coordinates of the bound¬ 
ary points of the foreground segment. Thus we obtain a 
set of 2D points p^j = 1,..., if with pj G where 
Pj = {xj^Uj). We loop through the shapes of our hu¬ 
man shape dataset Human3.6M and for each shape we ro¬ 
tate and scale it so that it has the same orientation and scale 
as the foreground candidate segment and sample through 
it boundary points. Thus we obtain a set of 2D points 
q_ji,j = 1,..., if, with I = 1,..., i/, where L represents 
the number of poses in the shape-pose dataset, in our case 
L = 100,000. 

Shape Matching and Transform Matrix: We employ the 
shape context descriptorO at each position pj from the 
candidate segment and each position from each shape 
from the dataset. We evaluate a distance on the resulting 
descriptors to select the indexes I with enough well-matched 
of boundary points such that we could estimate an affine 
transform. 

We apply a 2D Procrustes transform with 5 degrees of 
freedom (rotation, anisotropic scaling including refiections, 
and translation) on in order to align each shape in the 
dataset with the corresponding boundary points. This will 
result in a 3x3 transformation matrix W/ and an error for 
the transform ei which represents the Euclidean distance 
between the boundary points pj and the Procrustes trans¬ 
formed ones, Wi • qij, in the image plane. 

Prior Shape Selection and Warping: In order to deter¬ 
mine which prior shapes are relevant for the current detected 
query, we identified the subset of indexes in the dataset T 
which correspond to transformation errors that are smaller 
than a given threshold e. Thus, we obtain the corresponding 
figure-ground masks mt, t G T. For each mask mt we se¬ 
lected the coordinates of foreground pixels and warp them 
using the transform matrix computed using the 2D joint 
coordinates transformation. We apply the same procedure 
to the attached skeleton configuration of the corresponding 


mask. Thus, we obtain the coordinates of the foreground 
pixels for the transformed mask, and the transformed 
skelet coordinates 

Prior Shape Fusion: We compute the mean of the entire 
set of transformed masks, thus obtaining a MAE prior, 
S corresponding to the detection d as seen in figure sec¬ 
ond row. The values of the shape prior mask range from 0 
to 1 , background and foreground probabilities, respectively. 
Also we compute the mean of the entire set of transformed 
skeletons thus obtaining a configuration of keypoints 
B G with = (x, 1) where x and y represent the 

image coordinates of the warped joint from Human3.6M. 
This could be used to obtain problem dependent mask m as 
follows. Initially we set the mask to have the same dimen¬ 
sion as the entire image, filled with 0. We use Bresenham’s 
algorithm to draw a line between the semantically adjacent 
joints, for example: left elbow - left wrist, right hip - right 
knee, and so on. We assign the set of skeleton nodes to the 
foreground as T = {i G V|m(i) = 1}. This entire pro¬ 
cedure of obtaining the shape prior information (mask and 
skeleton) is illustrated in algorithmic 

3. Experiments 

We test our methodology on two challenging datasets: 
H3D|[8l which contains 107 images and MPII O with 3799 
images. In all cases we have figure-ground segmentation 
annotations available. For the MPII dataset, we generated 
figure-ground human segment annotations ourselves. Both 
the H3D and the MPII datasets contain both full and par¬ 
tial views of persons and self-occlusion and are extremely 
challenging. 

We run several segmentation algorithms including 
CPMC flAl as well as our proposed CPDC-MAF where 
we use bottom-up person region detectors trained on Hu- 
man3.6M and using region descriptors based on Q 2 P l[T^ . 
We also constructed a model referred to as CPDC-MAF- 
POSELETS, built using problem dependent seeds based on 
a 2D pose detector instead of proposed segments from a 
figure-ground segmentation algorithm. While any method¬ 
ology that provides body keypoints (parts or articulations) 
is applicable, we chose the poselet detector because it pro¬ 
vides results under partial views of the body, or self occlu¬ 
sions of certain joints together with joint position estimates. 
Conditioned on a detection, we apply the same idea as in our 
CPDC-MAF, except that we use the detected skeletal key- 
points to match against the exemplars in the Human3.6M 
dataset. A matching process based on semantic keywords 
(the body joints) is explicit, immediate (since joints are 
available both for the putative poselet detector and for the 
exemplar shapes in Human3.6M) and arguably simpler than 
matching shapes in the absence of skeletal information. The 
downside is that when the poselet detection is incorrect. 


Algorithm 1 Calculate S and B (Shape Matching, Align¬ 
ment and Fusion, MAF) 

Require: 

di = {z,b} 

d/, / = 1,..., L - 2D joint positions (Human3.6M) 
m/, / = 1,..., L - figure-ground masks (Human3.6M) 

L - number of poses (Human3.6M, use L = 100,000) 
e - threshold value for transform error 
f(-) - shape context descriptor 

/i - threshold value for for shape context descriptors 

Ensure: S, B 

Sample boundary points pj, j = 1,..., Ff on z 

for / G £ do 

Sample K boundary points j = 1,..., Ff on 
J={{x,y) eN^Xifiq^i),fiPy)) < ii} 

if I J| >2 then 

aiz(W) =pj-W qji 

Wi = argmin ^ a,;(W)Ta,KW) 

else 
ei = oo 

end if 
end for 

r={/G£|e, <e} 

for f G T do 

Vf - foreground pixels of mt, - background pixels 
of mt, V = Vb U V/ 

for u G V do 
if u G V/ then 

^,(W,.u) = l 

else 

^t(Wt • u) = 0 

end if 
end for 

= W, . di 
end for 

^ = ]T\ 

^ ^ rn ^ter^t 


the matching will also be (notice that alignments with high 
score following matching are nevertheless discarded within 
the MAF process). 

For CPDC-MAF, we initialize, bottom-up, by using can¬ 
didate segments from CPMC pool, selected based on their 
person ranking score after applying the 02P classifier. 
This is followed by a non-maximum suppression step were 
we remove the pair of segments with an overlap above 0.25. 
We use the MAF process to reject irrelevant candidates 
and to build shape prior masks and skeleton configuration 
seeds for the segments with good matching produced by 
shape context descriptors. On each resulting shape prior 


and skeleton seeds we run the CPDC-MAF model with the 
resulting pools from each candidate segment merged to ob¬ 
tain the human region proposals for an entire image. 

For each testing setup, we report the mean values (com¬ 
puted over the entire testing dataset) of the intersection over 
union (loU) scores for the first segment in the ranked pool 
and the ground-truth figure-ground segmentation for each 
image. We also report the mean values of the loU scores for 
the pool segment with the best loU score with the ground- 
truth figure ground segmentation. 

Results for different datasets can be visualized in table[T] 
In turn, figures show plots for the size of the segment 
pools and loU scores for highest ranked segments generated 
by different methods, with image indexes sorted accord¬ 
ing to the best performing method (CPDP-MAF). Qualita¬ 
tive segmentation results for the various methods tested are 
given in figure 


Method 

H3D Test Setil 


First 

Best 

Pool size 

CPMCQ3J 

0.54 

0.72 

783 

CPDC - MAF 

0.60 

0.72 

77 

CPDC - MAF - POSELETS 

0.53 

0.6 

98 


MPII Test SetEl 


First 

Best 

Pool size 

CPMCrT3l 

0.29 

0.73 

686 

CPDC - MAF 

0.55 

0.71 

102 

CPDC - MAF - POSELETS 

0.43 

0.58 

114 


Table 1. Accuracy and pool size statistics for different methods, on 
data from H3D and MPII. We report average loU over test set for 
the first segment of the ranked pool and the ground-truth figure- 
ground segmentation {First), the average loU over test set of the 
segment with the highest loU with the ground-truth figure-ground 
segmentation {Best) and average pool size {Pool Size). 

4. Conclusions 

We have presented class-specific image segmentation 
models that leverage human body part detectors based on 
bottom-up figure-ground proposals, parametric max-flow 
solvers, and a large dataset of human shapes. Our for¬ 
mulation leads to a sub-modular energy model that com¬ 
bines class-specific structural constraints and data-driven 
shape priors, within a parametric max-fiow optimization 
methodology that systematically computes all breakpoints 
of the model in polynomial time. We also propose a data- 
driven class-specific prior fusion methodology, based on 
shape matching, alignment and fusion, that allows the shape 
prior to be constructed on-the-fly, for arbitrary viewpoints 
and partial views. We demonstrate state of the art results 
in two challenging datasets: H3D|[8l and MPIll^. where 
we improve the first ranked hypothesis estimates of mid- 






















Pool Size Distribution - MPli Test Set 


— CPMC: 686.3361 
— CPDC-MAF-POSELETS: 116.1392 
— CPDC-MAF: 102.467 _ 



Image Index 


Figure 3. Dimension of segmentation pool for MPII and various methods along with average pool size (in legend). Notice significant 
difference between the pool size values of CPDC-MAF-POSELETS and CPDC-MAE compared to the ones of CPMC. CPMC pool size 
values maintain an average of 700 units, whereas the pool sizes of CPDC-MAE and CPDC-MAE-POSELETS are considerably smaller, 
around 100 units. 



Eigure 4. loU for the first segment from the ranked pool in MPII. The values for CPMC and CPDC-MAE-POSELETS have higher variance 
compared to CPDC-MAE resulting in the performance drop illustrated by their average. 


level segmentation methods by 20%, with pool sizes that 
are up to one order of magnitude smaller. In future work 
we will explore additional class-dependent seed generation 
mechanisms and plan to study the extension of the proposed 
framework to video. 
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additional image results. 
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